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earned patent term adjustment See 37 CFR 1.704(b). 

Status 

1 )S Responsive to communication(s) filed on 05 July 2006 . 
2a)D This action is FINAL. 2b)l3 This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) S Claim(s) 7-33 is/are pending in the application. 

4a) Of the above claim(s) 2.4. 12. 14,17, 19,22.24.27.28.31 and 33 is/are withdrawn from consideration. 

5) D Claim(s) is/are allowed. 

6M Claimfe) 1.3.5-11.13.15.16.18.20.21.23.25.26.29.30 and 32 is/are rejected. 

7) D Claim(s) is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) [S3 The specification is objected to by the Examiner. 

10) S The drawing(s) filed on 01 July 2003 is/are: a)[2 accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1.85(a). 
Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 

1 1) D The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 

Priority under 35 U.S.C. § 119 

12) D Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 19(a)-(d) or (f). 
a)D All b)D Some * c)D None of: 

1 .□ Certified copies of the priority documents have been received. 

2. Q Certified copies of the priority documents have been received in Application No. . 

3. D Copies of the certified copies of the priority documents have been received in this National Stage 

application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 
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Status of Application, Amendments and/or Claims 
Applicant's election without traverse of Group I (claims 1-32) and election of 

modifications at positions 223, 225, 226, 237 and 269 in the reply filed 05 July 2006 is 

acknowledged. In addition, all other claims reciting elected modifications (and 

combinations of those elected modifications) will be considered. 

Claims 2, 4, 12, 14, 17, 19, 22, 24, 27, 28, 31 and 33 are withdrawn from further 

consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected Group (or 

elected modification), there being no allowable generic or linking claim. Election was 

made without traverse in the reply filed on 1 7 April 2006. 

Claims 1, 3, 5-11, 13, 15, 16, 18, 20, 21, 23, 25, 26, 29, 30 and 32 are under 

examination. 

Inventorship 

In view of the papers filed 16 April 2004, it has been found that this 
nonprovisional application, as filed, through error and without deceptive intent, 
improperly set forth the inventorship, and accordingly, this application has been 
corrected in compliance with 37 CFR 1 .48(a). The inventorship of this application has 
been changed by the addition of Shannon A. Marshall. 

The application will be forwarded to the Office of Initial Patent Examination 
(OIPE) for issuance of a corrected filing receipt, and correction of Office records to 
reflect the inventorship as corrected. 
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Information Disclosure Statement 
The information disclosure statement(s)(IDS) filed 16 August 2004 and 30 June 
2006 were received and comply with the provisions of 37 CFR §§1.97 and 1.98. They 
have been placed in the application file and the information referred to therein has been 
considered as to the merits. 

Sequence Rules 

The specification is not in compliance with 37 CFR 1 .821-1 .825 of the Sequence 
Rules and Regulations. When the description of a patent application discusses a 
sequence listing that is set forth in the "Sequence Listing" in accordance with paragraph 
(c) of the Sequence Rules and Regulations, reference must be made to the sequence 
by use of the assigned identifier (SEQ ID NO:), in the text and claims of the patent 
application. 

37 CFR 1.821(a) presents a definition for nucleotide and/or amino acid 
sequences. This definition sets forth limits in terms of numbers of amino acids and/or 
numbers of nucleotides, at or above which compliance with the sequence rules is 
required. Nucleotide and/or amino acid sequences as used in 37 CFR 1 .821 through 
1 .825 are interpreted to mean an unbranched sequence of four or more amino acids or 
an unbranched sequence of ten or more nucleotides. Please see MPEP section 
2422.01. 
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The specification refers to sequences in Figure 2A, but does not identify the 
sequences by their sequence identifiers. Sequences appearing in drawings should be 
referenced in the corresponding Brief Description thereof. See 37 C.F.R. §1 .58(a) and 
§1.83. Appropriate correction is required. 

Applicant must submit a response to this Office Action and compliance 
with the sequence rules within the statutory period set for response to this Office 
Action. 



Claim Rejections - 35 U.S.C. § 112 

The following is a quotation of the first paragraph of 35 U.S.C. 1 12: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most neariy connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

Claims 1, 3, 5-11, 13, 15, 16, 18, 20, 21, 23, 25, 26, 29, 30 and 32 are rejected 
under 35 U.S.C. 112, first paragraph, because the specification, while being enabling 
for: 

a variant RANKL protein wherein said variant RANKL comprises modifications at 
positions R223M, R223E, R223Q, H225T, H225N, H225E, H225R, E226Q, E226D, 
E226R, Q237T, Q237K, Q237E, E269R, E269T, E269Q and E269K in combination with 
mutation C221S/I247E, 

does not reasonably provide enablement for: 



Application/Control Number: 1 0/61 1 ,363 Page 5 

Art Unit: 1647 

a variant RANKL protein wherein said variant RANKL comprises modifications at 
positions R223, H225, E226, Q237, E269. 

The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and/or use the invention 
commensurate in scope with these claims. 

The instant specification teaches that normal bone remodeling is a process in 
which new bone deposition by osteoblast is balanced through bone resorption by 
osteoclast. RANK is activated by the binding of its ligand, RANKL, which leads to 
differentiation, survival and fusion of pre-osteoclasts to form active bone resorbing 
osteoclast (page 1). The specification teaches that the present invention is directed at 
generating novel variants of human RANKL protein, comprising the extracellular 
domains of RANKL, which behave as RANKL antagonists or superagonists and 
modifications that confer soluble expression in E. coli. 

The specification states that it has been observed that human RANKL forms 
inclusion bodies when expressed in E. coli. Soluble expression allows for efficient and 
cost-effective production and manufacturing of human RANKL variants (page 3, line 25- 
page 4, line 5). Thus, it is important that variant RANKL proteins are soluble. The 
specification teaches that only specific RANKL variants (C221S/I247A, C221S/I247D, 
C221S/I247K, C221S/I247Q and C221S/I247E) showed soluble expression in bacteria 
(page 39, Table 1). The specification teaches the construction of a RANKL variant 
library, comprising the solubility-imparting modification C221/I247E. The specification 
teaches the classification of RANKL variants. Specific RANKL variants exhibited non- 
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agonistic activity (i.e. inhibition of osteoclastogenesis with or without RANK receptor 
binding and/or inhibition of OPG binding)(Table 2 and pages 44-47). 

It is known to those skilled in the art that certain positions in a sequence are 
critical to the protein's structure/function relationship, e.g. such as various sites or 
regions directly involved in binding, activity and in providing the correct three- 
dimensional spatial orientation of binding and active sites. These or other regions may 
also be critical determinants of antigenicity. These regions can tolerate only relatively 
conservative substitutions or no substitutions (see Wells, 1990, Biochemistry 29:8509- 
8517; Ngo et al., 1994, The Protein Folding Problem and Tertiary Structure Prediction, 
pp. 433-440 and 492-495). For instance, a point mutation change from glutamine to 
aspartic acid at position 226 enables a variant RANKL protein to go from non-binding to 
binding of RANK receptor. A point mutation change from glutamine to threonine at 
position 269 still enables a variant RANKL protein to bind RANK receptor but inhibits the 
variant from binding OPG. It would be apparent to one skill in the art, that the effects of 
these types of changes are largely unpredictable as to which ones have a significant 
effect versus not. The instant specification teaches specific variant RANKL proteins, 
which are suitable. Therefore, the recitation of any RANKL variant protein results in an 
unpredictable and therefore unreliable correspondence between the claimed 
biomolecule and the indicated similar biomolecule of known function and therefore lacks 
support regarding enablement. 

Lastly, daim 32 recites, "a pharmaceutical composition comprising a variant 
RANKL protein according to claim 1 and a pharmaceutical earner" and thus reads on 
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use of that composition for treatment/therapy. The specification fails to disclose a direct 
correlation (working examples, animal models, etc.) between the use of the instant 
invention and a method for treatment in subjects. Specific RANKL variants were able to 
inhibit osteoclastogenesis in vitro. This activity is not predictive of the activity RANKL 
variants might have in vivo. Freshney (Culture of Animal Cells, A Manual of Basic 
Technique, Alan R. Liss, Inc., 1983, New York, p4) teaches that it is recognized in the 
art that there are many differences between cultured cells and their counterparts in vivo. 
These differences stem from the dissociation of cells from a three-dimensional 
geometry and their propagation on a two-dimensional substrate. Specific cell 
interactions characteristic of histology of the tissue are lost. The culture environment 
lacks the input of the nervous and endocrine systems involved in homeostatic regulation 
in vivo. Without this control, cellular metabolism may be more constant in vitro but may 
not be truly representative of the tissue from which the cells were derived. This has 
often led to tissue culture being regarded in a rather skeptical light. Thus, it could not be 
predicted that the cell culture data presented in the specification would be in any way 
correlative with therapeutic agents for in vivo treatments. 

Due to the large quantity of experimentation necessary to generate the infinite 
number of derivatives recited in the claims and screen same for activity and the large 
quantity of experimentation necessary to show a correlation between a pharmaceutical 
composition comprising a RANKL variant and treatment of a specific disease/condition 
(including amounts and routes of administration for treatment in mammals), the absence 
of working examples directed to same, the complex nature of the invention, the state of 
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the prior art which establishes the unpredictability of the effects of mutation on protein 
structure and function, and the breadth of the claims which fail to recite any functional 
limitations, undue experimentation would be required of the skilled artisan to make 
and/or use the claimed invention in its full scope. 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

Claims 1, 3, 5-11, 13, 15, 16, 18, 20, 21, 23, 25, 26, 29, 30 and 32 are rejected 

under 35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly 

point out and distinctly claim the subject matter which applicant regards as the 

invention. The instant claims are indefinite in the recitation of amino acid positions in 

the absence of a referenced SEQ ID NO:. It is not clear what sequence is intended by 

the claims and thus the metes and bounds of the claims cannot be determined by one 

skilled in the art. 

Claim Objections 

Claim 3 is objected to because of the following informalities: The word 
"substitution" is misspelled. Appropriate correction is required. 



Conclusion 

No claims are allowed. 
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Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Regina M. DeBerry whose telephone number is (571) 

272- 0882. The examiner can normally be reached on 9:00 a.m.-6:30 p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Brenda G. Brumback can be reached on (571) 272-0961. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 

273- 8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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Additivity of Mutational Effects in Proteins - : \ 

r 



protein-protein interactions (Laskowski ct al 19sV i oro 

dat*K r J ^r 1 ^ 10 « he overall free energy asso- 

S? k bS£E"2 T PertieS <" ua «y «0kcaf/mol). 
mis. it is possible to modulate protein function by mutation 

rcsfet"^ UrgC dala baSC for frtc e "^ey changes that 
.y»u,. . hen .ingle mutant* are combined. A review of 
data show that, in the majority of cases the 1 
^ ^ved from £ USS^SSS Srttft 
2,1 h frCC e ? e ' gy Chan « c the mS 

o Z 8 . ,2 r0S . a,,C ,n « eractions or structural perturbaSns 

tA plus B)J, large deviations from simple additivitv can r«. iT 
from entropicefTects (Jencks, 1981). X 
« on enzyme activity, similar concluslonsZfbe^drSfl^ 
mulatlons alTccline protein-proieln Inter ffi ££S)NA 

Additivitv Relationships 

The change in free energy of a functional properly caused 
by a mutation at site X typically exprWffiSHnS 

O0O6-2960/90/O429-8509$02.5O/O 

O . •-. ' . 



a double mutant (desiRnated 2LYi ku~, wo l 10 ! h< * e of 
AA^y, = AA<? W + AA«7 (y) + AG, (|) 

between side chains a, si.esTahS Y exSn Ls^SS 

was a large discrepancy, suggesting the SZSSSS 
•Ja^' • en8t ^ ° f noncovakm interactions are strongly de- 

and l/r". respectively (for review see Fersh 1985)1 Tw' 
when the side chains at sites Y ,*i v.; * ,y «^J- Thus, 
another and assuming "no Ej5^S5Sl2 T 
^"ould be „eg„g| 8 |c and ™ EfcSSlft 
AAtfoy, at AA%, + AAG (Y , (2 ) 
This situation, here referred to as simple additivitv is * n4 „ en ,iiv 
observed except where side chains are close to «ch Sir 
when one or boch of the mutant, change the : rat SmJiina Itl I 
or rcactum mechanism. These principles m?SHH2 
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EaagJ compc jnts 

. figure I: Mot of the changes in transition-state stabilization energies 
for the multiple mutant versus the sum for the component mutants. 
Data are taken from Table I and represent mutants from subtitisin 
(a), tyrosyt-lRNA synthetase (O), trypsin (Q), OHFR (O), and 
glutathione reductase (A), where mutant or wild-type side chains 
should not contact one another. The dashed line has a slope of I , 
and the solid line is a best fit to all the data. 

Changes in transition-stale stabilization energy (AAGj) 
caused by a mutation can be calculated from cq 3 (Wilkinson 
ct al., 1983), in which R is the gas constant. T is the absolute 

<*cai/*M)mottm ... 
= -RT In ■ — (3) 

temperature, *«, is the turnover number, and K u is the Mi- 
chaclis constant for the mutant and wi!d-type enzyme against 
a fixed substrate. A AG}- represents the change in free energy 
to reach the transition-state complex .(E«S f .) from the free 
enzyme a--d substrate (E + S). 

To analyze the proposition that the interaction energy term, 
AGt<()< ' $ relatively small when the sites of mutation (X and 
Y) arc remote to. one another, AAG| values were collected 
from the literature where side-chain substitutions in the 
multiple mutant are beyond van der Waals contact (> 4 A 
. distant) from each other (Table I). There are at least 25 
examples distributed across five different enzymes where 
AaG\ values can be calculated for the individual arid multiple 
mutants assayed in at least two different ways. Among these 
arc examples where electrostatic interactions; hydrogen 
bonding, and steric and hydrophobic effects have been altered 
separately of in combination with others. The X-ray structures 
of the wild-type protein show that the wild-type side chains 
are not in contact. Modeling sugfc ssts the mutant side chains 
are beyond possible van der Waals contact unless the mutant 
side chains were to cause significant changes in the overall 
protein structure. Such large changes are rarely observed in 
structures of site-specific mutant proteins (Katz & Kossiakoff, 
1986; Alber ct al., 1987; Howell et al M 1986: Wilde et al., 
1988) or even highly variant natural proteins (Chothia & Lesk, 
.1986), 

A. collective plot of the sum of the AAGf values for the 
component mutants versus the corresponding multiple mutant 
(Table I) gives a remarkably strong correlation (R l = 0,92) 
with a slope near unity (Figure I). The simplest interpretation 
is that the interaction term, AGtw, is small compared to the 
overall effects on AAGtocyv 11 * formal| y possible that there 
arc large and compensating effects between side chains X and 
Y that systematically lead lb small net values for AGtcu- 

There are some notable exceptions that weaken the corre- 
lation within the data set (Table I). In particular, combining 
the R204L mutation in Escherichia coll glutathione reductase 
gives a less than additive sifest, especially when combined with 



another mutant, BU98J , 
residues ere not in direct cental 
at " " " v " " 9 
the largest discrepancies are'v/iteottw _ 
wfch NADH4 as compared (0 NADlL . 
of ihe AAGj values for two positively c& 
mutants in aiblilisl* (D99K and El S6K) 1 
clteci or the multiple mutant when assayed tfilh aa Afg tfutv. 
not with a Phe substrate (Russell & Fersht, 1987>. SucV* . 
discrepancies are not too surprising because charge-charge 
interactions fall off. as i/r and can exhibit long-range effects -. 
in proteins (for example, see Russell and fersht (1988)]. The v 
physical basis for other large discrepancies not involving 



pectedly large structural changes • or ehanges in enzyme 
mechanism (see beiow). 

These additivity tests are not particularly dominated by one. 
of the single mutants in the sum. Trie average contribution 
(±SE) for the most dominant mutant in ech sum calculated 
from the 69 additivity tests given in Table I is only 68% 
(i 1 5%) of the total sum (theoretical is ~50%). Furthermore. 
. ihe plot in Figure I is not analogous to graphs of correlated 
variables, where A is plotted versus the sum of A + B, because 
in Figure 1 the values on the y-axis arc determined inde- 
pendently from those on the x-axis. *■ 

Complex ADDrnvrrv in Transition-State 
Stab.lization— When A(7t(d ^ 0 

(A) Change in Interaction Energy between Sites X and K 
Where residues X and V are close enough to contact, it is more 
likely that the AGfo> term will be significant. There are 1 1 
examples collectively from'lyrosyl-tftNA synthetase and . 
subtilisin that fit this category (Table II). 

A series of mutants in tyrasyi-lRNA synthetase at positions 
48 and 51 (Carter et al., 1984; Uweet aL, 1985) show com-., 
plex additivity (Table II). His48 andThr5l in the wild-type 
structure are next to.each other On adjacent turns of an oc-helix. 
His48 hydrogen bonds to the ribose ring oxygen of ATP while 
Thr5 \ can make van der Waals contact with ATT. TheTSIP 
mutation increases the catalytic efficiency of the enzyme in 
some assays by more than -2 kcal/mol. (Wilkinson et al., 
1984). However, when this mutation is combined with mu- 
tations at position 48, the effects are not simply additive. An 
X-ray structure of the T5IP mutant indicates there are no 
structural changes in the o>hdw (Brown etaU 1987). Instead, 
it is suggested that the T5£P mutant is Improved over wild 
type because the wild-type enzyme contains a bound water in 
the vicinity of Thr51 that disfavors substrate binding: Blow 
and co-workers (Brown et al., 1.987) argue that ihe change 
in solvent structure propagated to position 48 may account for 
the complex additivity. In the previous section, the double 
mutant (H48GJ51 A) exhibited nearly simple additivity 
(Table I). Presumably; the smaller and less hydrophobic 
alanine substitution at position 51 should not introduce as Urge 
a change in solvent structure as the pyrrolidone ring of proline;. 
In the case of subtilisin (Table SI), Glul56 is near the top 
• of the P I binding crevice while Gly 1 66 is at the bottom. . In 
the wild-type enzyme these sites do not make direct van der 
Waals contact, but targe side chains substituted at position 
166 can be modeled to contact the residue at position 156. in 
fact, X-ray structural analysis shows that an Asn aide chain 
at position 166 makes a good hydrogen bond with Glu156 
(Bolt et at., 1987). Moreover, all of the substitutions are polar 
or charged, the energetics of which arc expected to be the most 
long range. Thus, the mutant side chains alter substantially 
the intramolecular interactions between positions 156 and 166. 
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Table I:. Comparison of Sums of &&Gj* from Componcin 
Nor C«niact One Another * 



Mounts vs the Muhlple Mutant Where the Mount or Wild-Type Side Chains Do 



assay 



component mounts 



mutllolw 
mount 



component mutants 



moltlpte 



TyrosykRNA Synthetase 



ATP/PP, 
ATP/lRNA 
Tyr/PP, 
Tyr/lRNA 

atp/pt, 

< « * * f u\na 

" : tWPf; 

Tyr/tRNA 

ATP/lRNA 
ATP/Tyr 

ATP/PPj 
ATP/tRNA 

Tyr/Tyr 
ATP/Tyr 



K 
R 



C35G + II4BG' 

\f tife'V » f o^w%# 






' 1 <£v « 1 Art 


+2.24 


+2.30 


+ 1.05 +1.13 


+2.18 


+ 1.68 


+1.1J +1.12 


+2.26 


+2 32 


+0.32 +1.12 


+ 1.45 










+ 5.20 -1.91 


-0.7 i 


-i.M 




JC 


— i.oo 


iiln e*4 








»^46 — 


. 0.74 


+0.32 +0.50 


+0.82 


+0.2! 


C35G +T5IC* 






+1.05 -0.93 


+0.12 


-0.22 


+ 1J4 -0.91 


+0.23 


.-0.13 


H48N +T5IA' 






+0.26 -OJfi 


-0.12 


+0.04 


. -0.13 -0.32 


-0.45 


-0.37 


T40A + H4SC 






+5.02 +3.1' 


+8.17 


+6.95 


+5.13 +2.44 


+7.57 


+6.67 


Rat J M isin 






G2I6A +G226A' . 






+2.75 +3 13 


+5.8S . 


+5.07 


+2.19 +4.91 


+7.10 


+5.90 


Dihydrofolatc Reductase (AAGfeW 




F31V +L54</ 






+ 1.6 +2.9 


+4.5 


•14.5 


+2.2 +2.9 


+5.1 


+4.5 



R 

F 



F 
v 



Subtflisln BPN' 
099K + E156K 

+ 1.29 +3.41 

+0.13 -0.49 -0.36 

EI56S. 
G166A + G169A. 

Y2I7L' 

-0.40 -1.46 -I-86 

* C.94 - J.Cj -0.09 

S24C 



4-2.74 
-0.42 



-1.70 



0 1 66A < 



MTX 



T; 

O 
A 
K 
M 
F 
Y 

E 
Q 

A 
K 
M 
F 
Y 



E 
Q 

A 
K 
M 
F 
Y 

R 
F 



r 
Y 



F 
Y 



F 
Y 



H64A 

+4.96 
+4.40 



-0.40 
+0.94 
E156S, 

G169A.+ h*7|A 
* Y217L HMA 



S24C. 



+4.96 
+4.40 



+ GK>6A 



Sublilisin BPN' 



•156S + Y217L + G169A' 






F 


-1.43 


-0.87. 


-0.62 


-2.02 


-2.06 


Y 


-0.60 


-0.36 


-0.32 


-1.28 


-1.14 




-0.15 


-0.41 


-0.27. 


-0.83 


-0.92 




+ 1.70 


-0.08 


-0.30 


+ 1J2 


+0.87. 




-0.86 


-0.32 


-0.39 


-1.57 


-1.41 


NADH 


-0.61 


-0.29 


-0.66 


. -1.56 


-1.17 


N/VDPH 


0.24 


-0.12 


-0.41 


-0.77 


-0.59 




E156S + Y217L 








NADH . 


-1.43 


.^0.87 




-2.30 


-1.67 


NADPH 


-0.60 


-0.36 




-0.96 


-0.96 




-0.15 


-0.41 




-0.56 


-0.53 


NADH 


+ 1.70 


-0.08 




+1.62 


+1.33 


NADPH 


-0.86 


-0.32 




-1.18 


-Ml 




tO.6! 


-0.29 




-0.90 


-0.84 = 




-0.24 


-0.12 




-0.36 


-0.32 


NADH 


EI56S, 
Y217L 


+ C169A 








NADPH 


-1.67 


-0.62 




-2.29 


-2.06 




-0.96 


-0.32 




-1.28 


-1.14 


NADH 


-0.53 


-0.27 




-0.80 


-0.92 


NADPH 


+ 1.33 


-0.3C 




+ 1.03 


+0.87 




-Ml 


-0.39 




-1.50 


-1.41 




-0.84 


-0.66 




-1.50 


-1.17 


NADH 


-0.32 


-0.41 




-0.73 


-0.59 


NADPH 



+4.56 
+5.34 



+3.50 
+3.37 



+3.81 
+4.90 



+2.65 
+4.81 



+3 " 
*4.,. 

E. toti Glutathione Reductase 
A179G + RI98M' 

-1.10 -0.62 -1.72 
+0.08 +2.68 +2:76 
A179G + R204L X 
-1.10 +0.41 -0.69 
+0.08 . +2.42 +2.50 
R198M +R204L 

-0.62 +0.41 -0.21 
+7.68 +2.42 +5.|0 

A,7V V + R2 04t. 



-1,46 
-1.03 
S24C, 
H64A, 
GI69A, 
Y217L 

+4.21 -0.40 
+3.96 f0.94 
S24C, EI56S. 
H64A. + GI69A. 
G166A Y217L 
+4.11 -1.46 
+5.84 -1.03 
EI56S. 

S24C, .C166A, 

H64A 



+4.96 
+4.40 



G169A. 
Y2171. 
-I.*>6 
+0.02 



D99S + EI56S* 
+0.47 +0.77 
0 -0.62 



+ 1.24 
-0.62 



+ 1.52 
-0.52 



NADH 
NADPH 



-0.51 
+3.70 
AI79G, 
R204L 
-1.54 
+0.87 
A179G, 
RI98M 
-1.32 
+2.11 

R179G + RJ98M + R204L 
-1.10 . -0.62 +0.41 
+0.08 +2.68 . +2.42 



-1.10 
+0.08 

RI98M + 

-0.62 
+2.6R 

I1204L + 

+0.41 
+2.42 



-1.61 
+3.78 



-2.16 
+3.55 



-0.91 
+4.53 

-1.31 
+5.18 



+4.11 
+5.84 



+4.21 
+3v96 



+3.53 
+6.07 



+3.53 
+6.07 



+3.53 
+6.07 



-1.32 
+2.11 

-1.54 
+0.87 

-0.51 
+3.70 



-1.72 
+2.22 



-1.72 
+2.22 



-1.72 
+2,22 

-1,72 
+2.22 



i 



•Carter ct al. (1984). The assays refer to measurements of ATP-dcpendent pyrophosphate exchange (ATP/PP,) or tRNA charging (ATP/tRNA) 
under s?'uraling conditions for tyrosine and vice versa for Tyr/PP, exchange .and Tyr/lRNA charging. »Lowe et al. (1985). The ATP/Tyr mU- 
vation assay refers to formation of tyrosyt adenylate under saturetln* concentrations of tyrosine. 'Jones ct al. (1986). 4 l^ther^rrow et al. (1986). 
The ATP/Tyr and Tyr/Tyr activation assays refer to formation of lyrosyl adenylate under prc*teady.ilate conditions, and k«JK H Is calculated from 
kJK s for tyrosine and ATP. respectively. 'Creik et al. (1985). The substrate was i>-Va|.Uu-(X).aminonuoroco^arin where the PI residue (X) 
b either Lys (K) or Arg (R). 'Mayer et at. (1986). Tlic tigand was elthcr.dihydrcfotite (H,f) or metjwtrexate (MTX), 'Wells el id. IWaJf, The 
substrate was succinyl-L-Alia'A1a.L-Prc-L<X)>nitroanlllde where the PI (X) residue (Schcchtcr & Berger, 1937) was either Olu (E), Ota i (Q), 
Ala (A), Lys (K), Met (M), Phe (F). ©r Tyr (Y). *Rukc11 and Fcrsht (1987). The substrate was be^.L-Vd-Oly-i^n^div^^ W «r 
succiny|.L.Ala-L.AIa.L.Prc-L.Phe.p-nilroani1idc (F). 'Carter et tl. (1989). The substrate was «<*"^«>f 

X was either Phe (F) or Tyr (Y). 'Scrutton el al. (1990). The assay followed the reduction, of oxidized glutathione by NADH or NADPH. 
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Tihte II*' C-nrvfecfl of Sum* of AAGr* f^oro Component Motaoii 
n the MoUv* Muunt Where the Mourn SMc Chains Caa CcnUct 
One Ano ther • ^ : • 

rr aa<V 



assay* 



component 
- mutants 



sum multiple mutant - 



Tyrosyl-lRNA Synthetase 
H4SG ♦ TS1P* 



ATP/PP, 

ATP/lRNA 

Tvr/PP; 

Tyr/tRNA 

A 1 r/Tyr 

Ty r /ATP 



+ 1.04 -1.91 


-CS7 


+ 1.07 


+ 1.13 -2.35 


-1.2? 


+0.77 


+ 1.12 -0.64 


4 0.48 


+ 1 0* 


+1.12 +0.50 


+ 1.63 


+0.P 


-r0.95 -l.vv 




+ 1.04 


=(US 


+0 69 


+0.82 


H48N + T51P 




-0.76 


+0.18 -1.99 


-1.81 


+0.36 -OJS 


-0.02 


-0.64 




-2.25 


-1.07 


N48G +T5IP 






+0.37 -094 


-0.57 


- +0.86 


+0.41 -I40O 


-0.59 


+0.45 


+I.i6 -1.05 


+0.21 


+O.90 


Q48G + TS1P 






-1.31 -1.09 


-2.40 


-1.12 


-2.05 -1.65 


-3.70 


-2.31 


-1 87 -1.85 


• j.72 


-2 23 


H48Q + T5IP 






+2.26 -1,99 


".-9.27 


+ 1.17 


+3.13 -0.38 


+2.75. 


+ I.4R 


+3.11 -2.23 


+0.33 


+!\26 


Subtilisin BPN' 






E1i6Q + Gl66iy 






-1.04 +1.27 


+0.23 


4-0.75- 


-0.45 +1.33 


+ 1.38 


+0.1* 


+2.15 +0.53 


+2.68 


+0.26 


E156S + G166D 






-0.59 +1.27 


+0.68 


+0.74 


-0:85 +1.83 


+0.98 


+0.66 


+1.68 +0.53 


+2.22 


+0.49 


E156Q + GI66N 






-1.71 -0.11 


-1.82 


^0.69 


-lX* +0.14 


-0.90 


-0.77 


-0.45 +0.18 


-0.27 


-i.io 


• +2.15 +0.48 


+2.73 


+1.16 


EI56S + G166N 




-1.44 -o.ii 


-1.55 


-0,51 


-0.59 . +0.14 


-0.45 


-0.85 


-0.85 . +0.18 


-0.67 


-0.78 


+ U8 • +0.48 


+2.16 


+ 1.26 


EI56S + G166K 






-1.44 -3.49 


^4.93 


-4.49 


-0.59 -1.03 


-1.62 


-0.95 


-0.85 -1.37 


-2.22 


-1.12 


+1.68 . +0.51 


+2.19 


+ 1.88 


E156Q + G166K 






-1.71 -3.49 


-5.20 


-4.49 


-1.04 -1.03 


-2.07 


-0.95 


-0.45 -1.37 


-1.82 


-1.12 


+2.15 +0.51 


+2.66 


+ l;88 
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Q 
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•Sec Table I for description assays. * Lowe et al. < 1985). 'Carter et 
al. (1984). 'Wells ct al. (i987b). 

In these six examples there are large and systematic discrep- 
ancies between the sum of the AAGf values for the single 
mutants and those of the corresponding double mutant (Wells 
et aL, 1987b). In almost all cases, the sum of the AAG T values 
for the single mutants is much greater than the value for the 
multiple mutant. Nonetheless, the A AG} value predicted from 
the sum of the single mutants does have the same sign as that 
for the double mutant, so that the single mutants predict 
qualitatively the effect on the multiple mutant. 

A plot (Figure 2) of the collective data set from Table II 
is in contrast to that seen in Figure 1. The A AC} values for 
the multiple mutants correlate more poorly with the sum of 



£aagJ components 

figure 2; Data are taken from Table II for mutants of subtilisin (B) 
or tvrosyt-tRNA synthetase (O) where mutant or wild-type side chains 
canxonttct each other. The dashed fine represents a theoretical tine 
of unity slope, and the solid line represents the best fit. 

the component single mutants {R l = 0.72). Moreover, the 
slope of the line (0.61) is much below unity. This indicates 
that the function of one residue is compromised by mutation . 
of another^ Of the 40 additivity examples* the average con- 
tribution of the most dominant single mutant to the sum of 
the AAC} values is 71% (±13%).of the total. Thus (as in 
Figure 1), both single mutants can contribute substantially 
to free energy changes measured in the multiple mutant. - 
However, this data set is derived from mutations at only two 
different sites on two different proteins. 

In summary, complex additivity can be observed when 
mutations at sites X and Y change the intramolecular inter- 
action energy between sites. This can be mediated- v direct 
stcric, electrostatic, hydrogen-bonding, or hydrophobic in- 
teractions or indirectly through larjge structural cihanges in the 
protein, solvent shell, or electrostatic interactions. Complex 
additivity is most likely to occur where the sites of mutation 
are very close together and larger of chemically, divergent side 
chains are introduced. x 

{B) Mutations at Sites X or Y Change the Enzyme 
Mechanism or Rate-Umititig Step, If the catolytic functions 
of two or more residues are interdependent, then a mutation 
of one residue can affect the functibning of the o{her(s). This 
form of complex additivity is Well illustrated for mutations in 
the catalytic triad and oxyanion binding site of subtilisin 
(Carter & Wells', 1988, 1990), In the catalytic mechanism 
of subtilisin (Figure 3); tfie ratc-limitirig step b *mide bond 
hydrolysis is transfer of the proton from Ser22l to His64 with 
nucleophiiic attack upon the scissile carbonyl carbon, this 
is accompanied by electrostatic stabilization of the protohated 
imidazole by Asp32 and" hydrogen bonding to. the oxyanion 
by the side chain of AsnlSS and the main-chain amide of 
Ser22l; Mutational analysis shows that once the catalytic 
Ser221 is. mutated to Ala (S221 A), additional mutations in 
the triad or oxyanion binding site cause no further loss in 
catalytic efficiency (Table III).. 

The S221A enzyme retains a catalytic activity that is still 
10 4 above the solution hydrolysis rate (Carter & Wells, 1988). 
It is proposed that this residual activity is derived from re- 
maining transition-state binding contacts outside of the cat- 
alytic 1 triad coupled with solvent attack upon the carbonyl 
carbon from the face opposite position 221 (Carter & Wells, 
1990). This proposal is based on a model showing that there 
is no room for a water molecule near Ala221 once the substrate 
is bound. Furthermore, conversion of Asnl 55 to Oly enhnneicis 
the activity of thcS221 A mutant by -U kcal/mol {Tabic HI). 
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Tabic lit: Comparison of Sums of AA<V from Component Mutants 
vs the A^G T * for Multiple Mutant* in thi Catalytic Triad and 
Oxyanion Binding Site of Subtiltstn BPN" 



component mutants 


Aim 


multiple mutant 


S22IA + H64A* 


+ 1*7.76 




+8.93 +8.84 


+8.83 


S22IA+ DJ2A 






+8.93 +6.52. 


+ 15.45 


+8.36 


HC4A + D32A 






+8.84 . +6J2 


+ 15.36 


+7.48 


S221A + H64A + 032A 






+8.93 +8.84 +6.52 


+24.29 


+8.$5 


S22,A + 032A 




+8.65 


+8.93 +7.48 


+ 16 40 


H64A + p32^* 






+8.84 . +8.86 • 


+ 17.70 


+8.65 






+8.65 


+6.52 +8.83 


+ 15.35 


S22IA + NI5SG' 






+8.93 +3.08 


+12.01 


+7.70 



ES ES* E-Ac 

figure 3: SchcmaUc diagram of the mechanism of subtilisin showing the ratc-Umiting acyLtkm step for hydrolysis of peptide bonds. Reproduced 
with permission from Carter and Wells (1988). Copyright I98S Macmillan. 

activity could not go beyond the diffusion-controlled limit 
(Albery & Knowlcs, 1976). 

Additive Effects on Substrate Binding 

The analysis above. considered changes m binding free en- 
ergies between the free en#me and substrate (E +• S) to yield 
the bound transition-state complex (E«S*). The steady-state 
kinct»c analysis for subtilisin and tyrosyMRN A synthetase is 
such that the K M values approximate the enzymersubslrate 
dissociation constant K v Additivity analysis based on calcu- 
lations of AAG Wodiflf (from K M values) or AAG^, (from 
values) yields qualitatively the same results (not shovn) as 
shown in Tables I and II and Figures 1 and 1 Thus, deviations 
from simple additivity are not systematically four. . ui either 
the energetics to form the E-S complex or those to reach E«S*. 

Additive Effects pn Protein-Protein Interactions 

The first clear examples of additive binding effects caused 
by amino acid replacements, in proteins were reported by. 
Laskowski et al. (1983) and reviewed by others (Ackers & 
Smith, 1985; Horovifz & Rigbi, 1985). One hundred natural 
variants of a proteinase inhibitor* the ovomucoid third .domain, 
have been isolated and sequenced from the eggs of different '. 
bird species (Empic & Laskowski, 1982; Laskowski et al; 
1 987). This is a nested set of proteins because for any one 
of these avian inhibitors there is a close relative containing only 
one or a few amino acid substitutions. Moreover, the asso- 
ciation constants (JTJ«f these inhibitors with a variety of serine 
proteinases vary over an enormous range (10*-fold). Laskowski 
et aL (1983, 1989) have shown that the effect of a given residue 
replacement on K % is about the same irrespective of the in- 
hibitor scaffold the replacement is made in. 

In addition to ovomucoid, four additivity examples have been 
constructed from natural variants at the subuhit interface of 
tctrameric hemoglobin (Ackers & Smith, 1985). Three ad* 
ditivily examples have been analyzed for interactions of hOH 
with its receptor (B. C. Cunningham and J, A. Wells, un- 
published results) and one example for association of synthetic 
variants of the -RNase S peptide with RNase S protein 
(Mitchinson A Baldwin, 1986). The entirety of this data set 
is not tabulated because much on the ovomucoid inhibitors 
and hGH Is unpublished. Nonetheless, these researchers were 
kind enough to provide their data formatted so It could be 
plotted collectively in Figure 4. These data consist of 9 1 
additivity examples (80 in ovomucoids alone), representing 22 
multiple mutants across four different proteins, and .span a 
wide range of change in binding free energy (-10 to +7 



•All enzymes were assayed with the substrate succinyla-Ala^t-Ala- 
L-Prc-L-Fhe-p-mtroamlide; *Caner and Wells (1988). * Carter and 
Wells 11990).. . - 

This is consistent with the opposite-face solvent attack 
mechanism of S221 A, because the oxyanion (Figure 3) would 
develop away from Asnl 55 and the N155G mutation improves 
SAiWenl accessibility to the scissile carbonyl carbon. 

Complex additivity is also seen for subtilisin mutated at 
positions 64 and 32. The double (H64A,D32A) and corre- 
sponding single mutants show a linear dependence upon hy- 
droxide ion concentration (between pH 8 and 10) that may 
reflect hydroxide assistance in the deprOtonation of the O7 
of Scr221 (Carter & Wells, 1988). Thus, once His64 is 
converted to Ala, Asp32 is a liability, presumably by elec- 
trostatic repulsion of hydroxide ion. (Note the -1.3 kcal/mol 
improvement in AAG| for the double mutant (H64A,D32A) 
compared to H64A alone; Table III.) 

In summary, if an enzyme mechanism relics upon cooper- 
ative interaction between two or more residues, then multiple 
mutations within this subset can result in large values for 
A (/{•(!)• In fact, if the mechanism is changed substantially, 
residues that were a catalytic asset can become a liability. 
Simple additivity can also break down when one or more of 
the mutations cause a change In the rate-limiting step. In an 
extreme case, one may have a number of mutants in on enzyme 
that enhance the activity, b- the cumulative enhancement of 
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, oiapayiccoccii nuclease 








V66L 4 G79S« 






GnHCI 


-0.2 -2.6 


-2.8 


-3.3 


trca 


+0.2 -1.9 


-2.7 


-3.6 




Y66L + G88V 






»/*"! 

VI Wl fV.1 


-0.2 -IX) 


- 1.2 


* 2.r 


urea 


+03 -0.9"- 


— 


-1.4. 




U8M A Ao*T 






uUHU 




w 


-2.8 


urea 


-0.7 -19 


_» - 






I58M + A90S 






GuHCI 


-0.6 -1.4 


-2.0 


la. . 


urea 


H>.7 -M 


-2.1. 






V66L + G79S + G88V 






GuHCI 


-0.2 . -2.6 -1.0 


-3.8 


-3.0 


urea 


+0.2 -19 -0:9 


r 3.6 


-3.4 




N -Terminal Domain or X Repressor 






G46A + G48A» 






\hcrmal melt 


+0.7 . +0.9 .. 


+1.6 


+1.1 




T4 Lysozymc . 








UC + C54V 






thermal meli 


+U -0.7 


+0.5. 


+0.4 • 




I3C + C54T 






thermal melt 


+1J +03 


+1.5 


+1.5 




13C + C54T + R96H 






thermal melt 


+1.2 +0.3 -2.8 


-1.3 


-2.5 




I3CCS4T + R96H 


-1.3 




thermal melt 


+1.5 -2:8 


-2.5 




13C + C54T+ AI<cT 




-0.5. 


thermal melt 


+1.2 +0J -15 


0 




I3CC54T+ AI46T 




^0.5 


thermal melt 


+1.5 -1.5 


0 
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figure 4: Plot showing the sum of changes in free energies or binding 
at pcotein^protetn interfaces for component mutants versus the 
corresponding multiple mutant, Data represent interactions between 
ovomucoid third domain and various serine proteases (□) (R. Wynn 
and M. Laskowski, persona! communication), regulatory interface 
of a A hemoglobin (•) (Ackers & Smith, 1985), hGH and its receptor 
(stippled A) (B. Cunningham and J; Wells, personal communication), 
and RNase S peptide and S protein (e) (Mitchinson & Baldwin, 
1986). The dashed line represents a ,me of unity slope, and the colid 
line is the best fit. 

kcal/moi). The plot shows a very strong linear correlation {R 1 
= 0.96) with a slope near unity. Although the data for the 
ovomucoid were not sorted to evaluate changes at intramo- 
lecular contact sites, most are riot expected :o be in contact, 
and all of the other examples represent noncontact sites. Thus, 
the large data base derived from natural variants of ovomucoid 
third domain, as well as a smaller number, of examples from 
severul other proteins, indicates that multiple mutations at 
protein-protein interfaces commonly produce simple additive 
effects. 

Additive Effects in DNA-Protein Interactions 

One of. the clear advantages in analyzing DNA-protem 
interactions is the ability to apply powerful selections that make 
analysis by random mutational studies feasible. Additivity 
in DNA-protcin interactions was first demonstrated by re- 
version analysis of X repressor (Nelson & St. jt, 1985). A 
mutation that decreased the binding affinity for the X operator 
site (K4Q) was reverted by mutations at several second sites 
(E34K, G48S, and E83K). When, these second-site revcrtants 
were introduced into wild-type X repressor, they caused in- 
creases in affinity similar to those observed in the first-site 
suppressor mutant (K4Q). 

Functional independence for mutations at DNA-protein 
contacts has been demonstrated by additive effects for mutants 
of CAP (catabolite gene activator protein) and its.operator 
sequence (Ebright et at.. 1987) as well as lac repressor and 
its corresponding operator sequence (Ebright, 1986). Simple 
. additivity of mutational effects in the operator sequences for 
Cro repressor (Takeda et al., 1989) and X repressor (Sarai 8c 
Takeda, 1989) has been most systematically demonstrated. 
Simple additivity has also been reported for multiple mutations 
in the lac repressor (Lehming et al., 1990). In fact, simple 
additivity is so predictable In DNA-protein interactions that 
the observation of complex additivity has been used to predict 
specific DNA-protein contacts in the lac reprcssor-opcrator 
complex (Ebright, 1986). 

Additive Effects on Protein Stability 

The first systematic analysis of additive effects of site- 
specific mutations on protein stability was reported by Shortlc 
and Meeker (1986). Five multiple mutants in staphylococcal. 



Bacteriophage f I Gene V 
V35I + 147V 
GUHCI -0.4 -2.4 -2.8 

Kringle-2oflPA 
H64Y + R68G' 

thermal melt +2.9 +0.7 ' +3.6 

Turkey Ovomucoid Third Domain 
G32A + N28S' 
thermal melt +0.8 -0.5 +0.3 

Y20H + N45-CHO 
thermal melt -4>.8 +0J -0.5 

a Subuntt of £. coll Trp S> &\ ta$e 
Y175C + G211E' 
GuHCI -0.1 +0.3 +0.2 



-2.9 



+3.4 



+0.2 



-1.3 



-Shortle and Meeker (1986). *Hecht et al. (1986). 'Wetzel et al. 
(1988). 'Sandberg and Terwifliger (1989). *R. Kellcy, personal 
communication. 'Ollewski and Uikowxki (1990). N45-CHO refers 
to a glycosyjation of Asn45. 'Hurlc et al. (1986). . - 

nuclease were constructed from a group of random single 
mutants that were screened Initially for their ability to affect 
the stability of the enzyme in vivo. The component, mutants 
do not make direct contact with each other in the multiple 
mutants. Generally, these variants exhibit nearly additive 
effects except for the double mutant V66L,G88V (Table IV). 
In addition to those of staphylococcal nuclease, additive effects 
on thc AAGgflfaMiat (assayed by reversible denaluration) have 
also been determined 1 for the N-tcrndnal domain of X repressor 
(one example; Hecht et al„ 1986), the a-subunlt of E; coll Trp 
synthetase (one example; Hurle et al„ 1986)« T4 lysozymc (six 
examples; Wetzel et a!., 1988), the gene V product of bac- 
teriophage fl (one example; Sandberg k Terwillteer, 1989), 
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combined, b one of the most pwgrfal tools ia designing 
functional properties in nroidna. Tfcla hftproach haa been 
remarkably successful In mabfflzias pntidiK to Irreversible 
faaethratkm, such 45 X repjrjsot. (Heik €1 aL, VM% subtffisin 
(Bryan et at, 1987; Caaningfeam ft Wetti, I9r ; Pantoliano 
et al.;1989), kanamycta nucteotidyhrafwferase (Uaoet el., 
1986; Mctsumura, 1986); rteutral protease (Imanaka et aL, 
1986), and Y4 lysozyme (Wetzel et a!., 1988; Matsut.jra et 
al„ 1989). This strategy has been applied (0 enhancing the 
catalytic efficiency of a weakly active variant of subtilisin 
(Carter ei ai. f 1989), engineering the aufoiraic a^faily of 
mtftilffln (Walls et al , 1987a,h; RtssellHS^c^^ 



^ AAG ^tt*^ components 

figure 5: ?k* showing sum of changes in free energy of unfolding 
or component muiants and. resulting multiple mutant. Data are taken 
from Tabk IV and represent staphylococcal nuclease (0), N-terminal 
domain of X repressor (O), T4 lysozyme (0), bacteriophage f! acne 
v product (O), ECnngle-2 domain of tissue pbsminogen activator (A) 
turkey ovomucoid third domain (a), and the *>subunit of Trp 
synthetase (V). The dashed liwrepttsentsa^eorcticalHneofunity 
slope, and the solid line represents the best fit 

natural variants of ovomucoid third domain (two examples- 
Otlewski & LaskowsW, 1990), and the Kringle-2 domain of 
human tissue plasminogen activator (t-PA) (one example; R. 
Kclley. personal communication). 

Collectively, this data set gives a high linear correlation (R 1 
~ 0.94) and slope near unity (Figure 5). The geiteratry dmpte 
additive behavior is somewhat surprising given the highly 
cooperative nature of protein folding. There are discrepancies 
in some of the additivity examples besides the staphylococcal 
nuclease mutant. (V66L,G88V), For example, the 1.5 
kcal/mol discrepancy for the Y 1 75C,G27 1 E double mutant 
in Trp synthetase (Table IV) is proposed to result. from the 
fact that these residues are in direct contact (Hurle et aL; 
1986). Furthermore, proximity effects may account for the 
large differences between the sum of the component mutants 
and the multiple mutants for the a-helical double glycine 
mutant G46AX348A in X repressor (Hecht et. nl., ic/R^V snd 
when combining R96H with lhe.C3-<97 disulfide mutant in 
T4 lysozyme (Wetzel et al„ 1988). In contrast; an exchange, 
of two side chains that contact one another (V35I and I47V) 
in the hydrophobic core of the gene V product of fl phage 
produced simple additive effects (Sandberg & Terwilliger, 
1989; Table IV). It should be noted that this data base ex- 
; hibiting simple addiuvity may be biased for single mutants 
that stably fold, because severely unstable proteins are 
difficult to express. 

By analogy to transition-state binding effects, one can 
certainly imagine instances where the stabilizing effects of 
mutations should reach a plateau. For example, denaturation 
at high temperatures ca * become controlled by 0 chemical step 
such as deamidation (Ahem et at., 1987), so that additional 
mutants that stabilize the f Ided form of the protein may be 
Irrelevant Another obvious example where complex additivity 
can be observed in protein stability is the stabilizing effect of 
disulfide bonds and noncovalent intramolecular contacts that 
require interactions between two or more residues. In these 
cases, the stabilizing interaction between two side chains can 
be broken with only one mutation. 

Applications of Additivity in Rational Protein 
Design 

A strategy of additive mulag.>nesls, where a series of single 
mutants each making a small improvement in function are 



the coenzyme specificity of glutathione reductase (Scrutton 
et aL, J990), designing protease inhibitors with exquisite 
protease specificity (LaskowsW et aL, 1989), uid recruiting 
human prolactin to bind to the K3H receptor (Cunningham 
et a!., I In addition, additivity principles have been used 
to engineer the pH profile of subtilisin (Russell & Fersht, 
1987) cmUo design the affinity and specificity of X repressor 
(Nelson A Sauer, 1985). 

For this approach to. work does not require that all the 
component mutants act in a. simply additive manner but just 
that their effects accomulate. For example, despite the com- 
pfex.addfcivUy of effects in the catalytic triad of subtilisin, there 
arc mutagenic pathways that are energetically cumulative for 
totalling the triad (Carter & Wells, 1988: Wells et al„ 1987c) 
Starting with the triple mutant S221 A,H64A,D32A. there is 
a progressive enhancement for installing Ser22l (-1.1 kcal/ 
mot), then His64 (-1.0 kcal/mol), and finally Asp32 (-6.5 
kcal/mol). Another cumulative pathway of Ser221, then 
Asp32, and finally His64 is possible if the Ser22I .AspP in- 
tcrmcdiatc were to use HisP2 substrates (Carter & 
1987). Elaborating such cumulative pathways is important 
for understanding how a catalytic apparatus may have evolved 
and is practically useful for considering how to install such 
catalytic machinery into w^.kly active catalytic antibodies: 

Conclusions ^ 

in the majority of cases, combination of mutations that 
aucci suusiraie or transition-state binding, protein-protein 
interactions,. DNA-prottin recognition, or protein stability 
exhibits simple additivity. Simple additivity is commonly 
observed for distant mutations at rigid molecular interfaces 
such as in protein-protein and DNA-protetn interactions, 
where the mutations are unlikely to alter grossly the structure 
or mode of binding. 

- -targe deviatiora frorn simple addilivity can occur when the 
sites of mutations strongly interact with one another (by 
making direct contact or indirectly through electrostatic in- 
teractions or large structural perturbations) and/or when both 
sites function cooperatively (as for the catalytic triad and 
oxyanion binding site of subtilisin). Changes at sites that can 
contact each other do not always lead to complex additivity; 
this may reflect relatively weak interactions between the two 
sites or indicate that the interactions arc compensatory and 
appear to be weak. 

It is important to point out the magnitude of errors in 
predicting the free energy effect in the multiple mutant from 
the component single mutants, tienerally, for those cases 
exhibiting simple additivity (Figures 1, 4, and 5), the dis- 
crepancy in free energy between the sums of the components 
and multiple mutants is about ±25%. Part of this is the result 
of compounding errors when summing the single mutants, and 
the rest is presumably due to weak interaction terms. 
Nonetheless, this means that If the total free energy change 
is about 3 kcal/mol, the change in the equilibrium constant 
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(related to Kg/K* « 10-V«- = i S5 ) will often be off by a 
factor, of * ffiua, while the free energy effects toeumulato. 
jjgWeaM deviations will occur in predicting the final entil- 
Uoriam constants when component mutants contribute a laira 
free energy term. 

Simply addilivity ^effects the modularity of exponent 
ammo acids ta protein function. This results from the fad 
that the perturbations in energetics and structure resulting 
from ,m«t mutations are highly localized. In the past six yal 
an additive mutagen^ strategy ha, hecn ftrtremelvetTadivs 
u. engineering ; proteins of coarse, nature hsj bccp'usnw this 
strategy much longer. . _ 
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abstpj*CT: Femtosecond spectroscopy was used in combination with site-directed mutagenesis to study the 
influence of tyrosine M210 (YM210) on the primary electron transfer in the reaction center of Rhodobacter 
sphaeroldes.^ The exchange of YM21 0 to phenylalanine caused the time constant of primary electron transfer 
to increase from 3.5 ±0.4 ps to 16*6 ps while the exchange to leucine increased the time constant even 
more to 22 ± 8 ps. The results suggest that tyrosine M2 10 is important for the fast rate of the primary 
electron transfer. ■ 



The 



he primary photochemical eveut during photosynthesis of 
bacteriochlorophyll- (Bchl-) containing organisms is a light- 
induced charge separation within a transmembrane protein 
complex called the reaction center (RC). The crystal struc- 
tures of RCs from Rhodopseudomonas (Rps.) virldis and 
Rhodobacter (Rb.) sphaeroldes have been solved to high 
resolution [reviewed in Deisenhoferand Michel (1989), Chang 
ct al. (1986), Tiede et al. (1988), and Rees et al. (1989)]. The 
RC from Rb. sphaeroldes contains three protein subunits 
referred to as L, M, and H, according to their respective 
mobilities in SDS-polyacryf amide gels. Associated with the 
L and M subunits are the cefaclors, consisting of four Bchl 
a t two bactcriopheophytin (Bph) a, one atom of non-heme 
ferrous iron, two quinones (Qa and Q B ), and in some species 
one carotenoid (reviewed in Parson (1987) and Fchcrel al. 

1 Financial iupport was from the Jtevuchc Fonchungsgemeinschaft, 
SFB 143. 

♦To whom correspondence should be addressed. 
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(1989)]. The cofaclors are arranged in two branches (Figure 
1) with an approximate C 2 axis of symmetry. The kinetic data 
support a model in which the primary electron transfer pro- 
ceeds after light absorption by the primary donor {a special 
pair of Bchl referred to as P; reviewed in Kirmaicr and Holten 
(1987)]. The absorption of light generates the excited elec- 
tronic state P # , which has a lifetime of approximately 3 ps. 
An electron is transferred from P along only one branch (the 
so-called A-branch). It is generally accepted that after ap- 
proximately 3 ps the electron arrives at the Bph on the A-slde 
(H A ) and after 220 ps it reaches Q A . The role of the accessory 
Bchl located between P and H A (referred to as B A ) has not 
been definitely assigned. Recently, we have shown that al 
room temperature an additional kinetic (r = 0.9 ps) component 
is detectable (Holzapfel et aU 1989). The spectral properties 
and the kinetic constants lead to the conclusion that the 
corresponding intermediate is the radical pair P 4 B A " (Hoi- 
zapfclcial., wo). 

Additional intriguing points concerning the process of 
© 1990 American Chemical Society 
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Computational Complexity, 
Protein Structure Prediction, 
and the Levinthal Paradox 

J. Thomas Ngo, Joe Marks, and Martin Karplus 



1. Perspectives and Overview 

A protein molecule is a covalent chain of amino acid residues. Although 
itds topologically linear, in physiological conditions it folds into a unique 
(though flexible) three-dimensional structure. This structure, which has 
been determined by x-ray crystallography and nuclear magnetic resonance 
for many proteins (Bernstein et at, 1977; Abola et aL, 1987), is referred to 
as the native structure. As demonstrated by the experiments of Anflnsen and 
co-workers (Anflnsen et aL, 1961; Anflnsen, 1973), at least some protein 
molecules, when denatured (unfolded) by disrupting conditions in their 
environment (such as acidity or high temperature) can spontaneously refold 
to their native structures when proper physiological conditions are restored. 
Thus, all of the information necessary to determine the native structure can 
be contained in the amino acid sequence. 

From this observation, it is reasonable to suppose that the native fold 
of a protein can be predicted computationally using information only about 
its chemical composition. In particular, it should be possible to write down 
a mathematical problem that, when solved, gives the native conformation 
of the protein. This procedure would be self-contained, in the sense that 
no additional information about the biology of protein synthesis would 
be required. Further, it is reasonable to hope that this procedure could 
be accomplished without requiring an astronomical amount of computer 
resources, given the observation that polypeptide chains do fold to their 
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Table 14-1. Glossary of problem-related teams. 



problem instance The task of finding the global minimum* of a particular 
function f(x x , x 2i . . . , x n ) 9 e.g., the task of finding the value of x that will 
give the smallest possible value of f(x) = 18x 7 + x 3 - 3x 2 . Note that the 
problem instance must be self-contained — everything necessary to specify 
the problem fully must appear. 

problem A well-defined class of problem instances, e.g., the task of glob- 
ally minimizing f{x x , x 2t . . . , x n ) 9 where / is any polynomial in the vari- 
ables x tt x 2 x n . 

restriction Some stipulation about what constitutes an instance of the 
problem. In the example we have been using so far, it was stated that / is 
restricted to be a polynomial in one or mote real variables, 
restricted problem One problem, X, is said to be a restricted version 
of Y if X comprises a subset of the instances of Y. For example, the 
problem of globally minimizing quadratic functions is a restricted version 
of the problem of globally minimizing polynomial functions, because all 
quadratic functions are polynomials. 

instance description The information needed to specify a particular in- 
stance of a given problem. In the example we have been using so far, 
the instance description might contain die following information: (a) die 
number of variables x t is 1; (b) the polynomial is of degree 7; (c) the 
coefficients of the polynomial terms are, in descending order of degree, 
{18, 0, 0, 0, 1, -3,0, 0}. Note that it is possible to construct a precise in- 
stance description like this only when the restrictions on a problem are well 
defined. 

instance size The amount of memory needed to store a particular instance 
description. 



a Por simplicity in this discussion, we assume that the only computational 
tasks of interest to the reader are those that involve global optimization of 
a function. 



native structures in an amount of time far less than required for an exhaustive 
search. 

Based partly on these suppositions, computational chemists have de- < 
veloped and are refining expressions for the potential energy of a protein as 
a function of its conformation (Momany et aL, 1975; Brooks et aL, 1983; 
Weiner et aL, 1986, for example). According to the Thermodynamic Hy- 
pothesis (Epstein et aL, 1963), finding the global minimum of a protein's 
potential-energy function should be tantamount to identifying the protein's 
native fold. 1 (All notes are at the end of the chapter.) Unfortunately, the 
task of finding the global minimum of one of these functions is not easy, in 
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Ibble 14-2, Glossary of algorithm-related terms. 



algorithm A computer program that takes an instance description as input 
and supplies an answer to the given problem instance as output Note that 
an algorithm is always associated with a particular problem, including any 
associated restrictions; otherwise, the instance description may not make 
any sense to the program. 

correctness An algorithm is said to be correct for a given problem in- 
stance if it gives the right answer when fed with the description of that 
instance. It is said to be correct for a problem if it is correct for all possible 
instances of the problem. When an algorithm is simply said to be cornet, 
it is understood to be correct for the problem for which it was designed, 
efficiency An algorithm is efficient for a given problem if is guaranteed 
to return some answer, right or wrong, to every possible instance of the 
problem, within an amount of time that is a poiynomially bounded function 
of the instance size. Note that there is no notion of efficiency for a single 
problem instance. 

It must be emphasized that this definition of the word "efficiency," which 
we employ throughout this review, does not necessarily correspond to the 
practical notion of efficiency. An algorithm whose running time is pro- 
portional to n 100 , where n is the size of a problem instance, is far from 
practical; nevertheless, it is considered efficient by this definition. Thus, 
while it is usually reasonable to assume that an inefficient (exponential- 
time) algorithm is too slow to be of any practical use except for small 
problem instances, it is not always reasonable to assume that an efficient 
algorithm is fast enough to be of practical use. 

guarantee A statement about the behavior of an algorithm that can be 
proved rigorously. For example, an algorithm is not considered to be effi- 
cient for a problem unless polynomial time bounds can be proved; it is not 
enough to be able to say that the algorithm has met the given time bounds 
for all instances of practical interest with which it has been tested, 
poiynomially bounded function A function/ (n) is poiynomially bound- 
ed if there exists some polynomial function g(n) such that /(n) < g(n) 
for every positive value of n. In the interest of brevity, compute- scientists 
call a function polynomial if it is poiynomially bounded, even though some 
functions that are not normally considered polynomial (such as logn) are 
included in this definition. Similarly, a function is called exponential if it 
is not poiynomially bounded, even though some functions that fit this de- 
scription (such as n !o * n ) are not normally thought of as exponential (Garey 
and Johnson, 1979). 

exhaustiveness An algorithm is said to search its solution space exhaus- 
tively if it tests every possible candidate solution. An algorithm that is 
exhaustive will always be correct; but to be correct, an algorithm need not 
be exhaustive. 
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particular because the potential-energy sdrface of a protein (as represented 
by an empirical potential) contains many local minima (Elber and Karplus, 
1987, for example). Many clever and extremely creative techniques have 
been employed to try to escape from these local minima (Hela et al., 1989; 
Gordon and Somorjai, 1992; Head-Gordon and Stillinger, 1993), but so 
far no practical method for globally optimizing the potential energy of a 
protein has been produced. 

The central question addressed in this review is this: Is there some 
clever algorithm, yet to be invented, that can find the global minimum of a 
protein's potential-energy function reliably and reasonably quickly? Or is 
there something intrinsic to the problem that prevents such a solution from 
existing? 

Many measures of a problem's difficulty are possible. They range from 
the informal to the formal, and they focus on various sources of difficulty. 
Each can be useful in its own right F6r example, proteins are certainly "hard 
to model with quantitative accuracy," the more realistic eneigy functions 
are "complicated to code as computer subroutines," and the algorithms that 
one uses to try to find the minima of these functions can consume seemingly 
unlimited amounts of supercomputer time. 

Instead of such qualitative statements, the results reviewed here focus 
on a formal measure of difficulty called intractability. A problem is said 
to be tractable if there exists an algorithm for it that is guaranteed to 
be correct and efficient It is said to be intractable if no such algorithm 
exists. (Several terms such as "correct" and "efficient" are used in a precise 
technical sense in this chapter. Tables 14-1 and 14-2 list their definitions.) 
When aproblemis intractable, there is generally arather unforgiving limit to 
the size of the problem instances that can be solved correctly before running 
times become astronomically large, and this limit is relatively uninfluenced 
by implementation details such as cleverly designed program codes and 
improvements in computer hardware. 

It is widely believed that the problem of locating the global minimum 
of a protein's potential eneigy function is intractable by this definition. 
However, the conventional reasoning underlying this belief is fallacious. 
The conventional argument proceeds as follows. Although the bond lengths 
and angles in a protein can be predicted easily since they cannot vary much, 
torsion angles (rotations about bonds) are not easily predicted. Rotatable 
torsions tend to have three preferred values, none of which can be ruled 
out a priori. If there are N rotatable torsions in the protein molecule, then 
there are 3* possible combinations of those torsions, lb be sure to find the 
best combination, a computer program would have to try them all. But 3* 
is a huge number for typical values of N because it is exponential in N. For 
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typical values of N (~ 100 or larger) and the speed of current hardware, 
the expected running time of this exhaustive algorithm is astronomically 
long. 

This reasoning is fallacious. For some variants of the problem, it is 
wrong. 2 Every problem in combinatorial optimization 3 has an exponential 
number of candidate solutions. It will therefore require an exponential 
amount of time to solve any such problem by an exhaustive search of 
its candidate solutions. However, it is rarely necessary to proceed by such 
bnite-force tactics (see Figure 14-6). With some thought, it is nearly always 
possible to do many orders of magnitude better than exhaustive search. 
Moreover, with many combinatorial optimization problems, algorithms can 
be found that are both efficient and correct (have polynomial time bounds 
and always give a right answer-^see Table 14-2). 

Is global potential-energy minimization inherently impossible to accom- 
plish efficiently without sacrificing correctness, or is an efficient, correct 
algorithm waiting to be found? Recently, efforts have been made to an- 
swer this question using the formal tools of the theory of NP-completeness 
(Ngo and Maries, 1992; Unger and Moult, 1993). Introduced in the 1970's, 
NP-completeness theory (Lewis and Papadimitriou, 1978; Garey and John- 
son, 1979; Lewis and Papadimitriou, 1981) was developed to help discover 
why certain problems in combinatorial optimization seem to be intractable, 
whereas others are not A problem that is found to be NP-complete or 
NP-hard by an analysis of this type is intractable if P#NP. (The meaning 
of the proposition <4 P^NP" is summarized in Figure 14-1.) 

One motivation for undertaking this line of inquiry pertains to the de- 
velopment of structure-prediction algorithms. Practical solutions to NP- 
complete problems do exist They are compromises that entail well-under- 
stood tradeoffs between guarantees of efficiency, correctness, and generality 
(Papadimitriou and Steiglitz, 1982), and the forms of these compromises 
have been studied extensively. Thus, the mere fact that a problem is known 
to be NP-complete can guide algorithm developers to existing classes of 
heuristic solutions. Moreover, the details of an NP-completeness proof can 
expose the sources of a particular problem's complexity. With this knowl- 
edge, the algorithm developer can know in advance that certain algorithms 
are bound to fail, and might identify restricted forms of the problem that 
can be solved efficiently. These issues are discussed in Section 4. 

Hie other goal is to obtain an improved understanding of the Levinthal 
paradox (Levinthal, 1968, 1969). Hie Levinthal paradox refers to the 
observation that although a protein is expected to require exponential time to 
achieve its native state from an arbitrary starting configuration, the process 
of folding is not observed to require exponential time. But why is a protein 
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expected to require exponential time to fold? The conventional justification 
for this premise requires the use of a model of protein behavior that leads to 
incorrect physical consequences. A reformulation of the Levinthal paradox 
with a more rigorous reason to expect exponential-time folding is discussed 
in Section 5. 



'T^NF'— What does it mean? 

The proposition P^NP, whose truth (or falsehood) has not been proved, 
is a pivotal conjecture in the theory of NP-completeness (Lewis and Pa- 
padimitriou, 1978). P is the class of problems for which conect algoriduns 
with polynomial time bounds exist; NP is the class of problems for which a 
correct answer can be verified in polynomial time. The underlying theory 
and the precise meaning of the designation "NP" are too subtle to treat 
properly in this review, but the consequences of the proposition are easily 
summarized. 

• Computer scientists have identified several classes of problems whose 
intractability is likely, but not yet proved Two of the most important 
such classes are called NP-complete and NP-hard. Membership in 
these classes is well-defined whether or not P=NP. Hundreds of NP- 
complete and NP-hard problems have been identified; many of these 
problems are of great practical significance. 

• If P=NP, then all NP-complete problems are efficiently solvable. If 
P/NP, then all NP-complete problems are intractable. 

• The class of NP-hard problems includes all NP-complete problems 
and others. An NP-hard problem can be thought of as being "at least 
as hard as" an NP-complete problem. If the NP-complete problems 
are intractable, then all of the NP-hard problems are intractable. But 
if die NP-complete problems are efficiently solvable, some NP-hard 
problems will still be intractable. 

• (Almost) nobody believes that P=NP. 



Figure 14-1. Meaning of the proposition "P^NP." 

The conclusions that may be drawn from the results described here are 
rigorous but qualified. In addition to reviewing the results themselves, much 
of the space in this chiapter is devoted to identifying the caveats associated 
with each possible inference. There are some fundamental limitations of 
the scope of this approach that we state in advance. 

First, the theory of NP-completeness can be used to address only certain 
aspects of the protein-folding problem. The protein-folding problem can be 
defined as encompassing, but not being limited to, the following questions: 
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1. Why does a protein have a unique native structure, i.e., why is such a 
small portion of a protein's conformational space significantly popu- 
lated under physiological conditions? 

2. What can be said about the pathway(s) in conformation space by which 
proteins reach their native states? 

3. What accounts for the observed rate at which proteins fold? 

4. Can a protein's three-dimensional structure be predicted from its amino 
acid sequence, and if so, how? 

The results that we review here are related directly to questions about struc- 
ture prediction (4) and indirectly to the consideration of folding rates (3), 
but they have little to do with the existence of unique native structures 
(1) and fee pathway(s) by which a protein folds (2). (This is not to say 
that the four questions are unrelated to each other. All aspects of the 
protein-folding problem are determined by the potential-energy function 
that describes the polypeptide chain. Thus, the population of a unique na- 
tive structure is clearly related to the prediction of that structure.) Many 
aspects of the protein-folding problem are addressed in a recent volume 
edited by Creighton (1992), 

Second, it is incorrect to state that 4 *the protein-structure prediction prob- 
lem has been shown to be NP-haxd" There are numerous approaches (Fas- 
man, 1988) to protein-structure prediction that either do not employ global 
potential-energy minimization at all, or include stipulations on the nature of 
the solution in addition to the energy function itself. Secondary-structure 
prediction, when based on statistical rules derived from known structures, 
is one approach in which no potential-energy function is used. While NP- 
completeness theory certainly could be used to analyze secondary-structure 
prediction, the analysis might be irrelevant because models for secondary- 
structure prediction are usually designed with efficient (usually linear-time) 
algorithms in mind. The difficulty of predicting secondary structure arises 
from die inadequacy of the underlying models in predicting what occurs 
in reality, not from the time required to solve the computational tasks that 
arise in the context of those models. The results reviewed here have a 
formal bearing only on algorithms that operate by attempting to solve a 
self-contained 4 global minimization problem. 

It is also incorrect to state that "global potential-energy minimization 
for a protein has been shown to be NP-hard," even though this statement 
is much closer to the truth than the previous one./>An NP-completeness 
proof can, at best, address a form of the problem that is more general 
than protein-structure prediction by energy minimization — and therefore, 
possibly more difficult. Put another way, protein-structure prediction might 
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be an easy restricted form (special case) of the problems (Sections 3.3, 3,4, 
and 3.5) that are known to be NPrhanL This subtle point, which we discuss 
in Sections 4 and 6, turns out to be central to understanding the limitations 
of the results. 

Third, for some purposes, intractability itself is not as bad as it sounds. 
The intractability of a problem means that no algorithm for it can be efficient 
and correct These qualities would make a protein-structure prediction 
algorithm, perhaps literally, "too good to be true." Most developers of 
protein-structure prediction algorithms gave up on such high standards long 
ago; they focus efforts on developing algorithms that fall short of the ideal 
in some way. Thus, the NP-hardness of a problem is a somewhat weak 
statement, even given the very likely assumption that P^NP. An objective 
of this review is to explore what may be inferred from this weak statement, 
given that it is one of very few statements about the difficulty of protein- 
structure prediction that are known rigorously to be true. 

In writing this review, we have tried to focus on objectives that are 
appropriate for this young line of inquiry. First, we believe that the question 
most important to the reader, given that he accepts that the proofs being 
reviewed are mathematically correct, is what they mean. Accordingly, we 
include an exposition of the theory of NP-completeness (Section 2) that 
is intended to give an accurate picture of the form of an NP-completeness 
proof without becoming mired in technical details. Similarly, in explaining 
the proofs themselves (Sections 3.3, 3.4, and 3.5), we describe the essential 
steps — it is impossible to understand exactly what is proved otherwise — 
but do not attempt to persuade the reader of their mathematical correctness. 
We invite the reader who is interested in the technical details to refer to 
the original papers (Ngo and Marks, 1992; Fraenkel, 1993; Unger and 
Moult, 1993) and to existing references on the theory of NP-completeness 
(Lewis and Papadimitriou, 1978; Garey and Johnson, 1979; Lewis and 
Papadimitriou, 1981). 

Second, we believe that one of the functions of a review is to reinterpret 
and evaluate existing results. Therefore, interspersed with a straightforward 
recitation of the facts, die. reader will find our opinions and speculations 
about the implications of this line of reasoning, both for the development 
of algorithms (Section 4) and for the behavior of real proteins (Section 5). 

Third, from the limitations of the existing results, it is clear that the use 
of computational complexity theory for tasks in protein-structure prediction 
is by no means a closed book. In Section 6 we point out areas in which 
continued analysis might be of value, particularly given the results that have 
already been established. 
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which depends steeply uii Uie energy gap U. Given the assumptions that 
N = 100 and n = 2, it was found that in the limit U 0, the first-passage 
time is nearly 10 30 years. However, a modest change to the value of U, say 
U = 2*7\ lowers the first-passage time to under one second. (The base of 
the exponential, 1 + nexp(-£//*r), is equal to 3 when t/ = 0, but 1 27 
whentf = 2M\) 

Hie analysis of Zwanzig et al. resolves a form of the Levinthal paradox 
in which the absence of clues about the form of the native state is the sole 
basis for expecting exponential-time folding. However, it does not resolve 
the form of the paradox based on computational complexity, since the opti- 
mization problem implied by the underlying model can be solved trivially 
in linear time. The reason for the tractability of the underlying model is 
the lack of long-range interactions, which are critical to rendering PSP NP- 
haid (Ngo and Marks, 1992), and essential for cooperativity (Karplus and 
Shakhnovich, 1992). 

6. Future Work 

It is not known whether there exists an efficient algorithm for predicting the 
structure of a given protein from its amino acid sequence alone. Decades 
of research have failed to produce such an algorithm, yet Nature seems to 
solve the problem. Proteins do fold 1 The "direct" approach to structure 
prediction, that of directly simulating the folding process, is not yet possible 
because contemporary hardware fells eigjbt to nine orders of magnitude short 
of the task. However, while this difference is large, it is not astronomical. 
Would this "direct" approach constitute an efficient and correct algorithm 
for protein-structure prediction? Too little is known about protein folding, 
and about die future of computing technology, to be able to answer this 
question at this time. 

Tlie results reviewed here (Section 3) do not completely rule out the ex- 
istence of a protein-structure prediction algorithm that is both efficient and 
correct, in the precise senses of those words used throughout this chapter. 
In particular, it remains formally possible that thete is a restricted form of 
PSP that is efficiently solvable, but subsumes protein-structure prediction. 
How can this possibility be investigated? 

A standard strategy in the analysis of any NP-hard problem is to ex- 
amine restricted forms of the problem systematically, classifying each as 
tractable or NP-hard, and thereby exposing the sources of the complex- 
ity. Barahona's results with Ising spin-glass models, which were described 
briefly in Section 4, are exemplary of this approach. While the particular 
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restrictions chosen by Barahona for spin glasses (reduction of dimensional- 
ity and removal of the magnetic field) are not suitable for protein-structure 
prediction, the overall strategy of examining restricted forms is appropriate. 
Some restricted form of PSP in which compactness plays a critical role is 
a candidate for this type of analysis (Section 4.6). 

The approach of considering restricted forms has worked well for 
dozens of important problems that are relatively "clean" and abstract (Garey 
and Johnson, 1979), but it may be difficult to pursue in the case of protein- 
stracture prediction. In the former case, the problem shown to be NP-hard 
is usually as general as would actually be required in practice. In the latter 
case, what is desired is not an algorithm that can handle all possible in- 
stances of PSP (Section 3), but merely one that works for proteins. Thus, 
the fact that PSP is a generalization of protein-structure prediction makes 
the result that PSP is NP-hard less limiting than it could be. 

Ideally, one would like to demonstrate the NP-hardness of a problem 
that is more specific, not more general, than protein-structure prediction, be- 
cause that would automatically prove the NP-hardness of protein-structure 
prediction itself. This would entail finding an efficient transformation from 
some existing NP-complete problem that generates instances of PSP that 
are proteins by every conceivable criterion. 38 It is difficult to see how such 
a transformation might proceed. 39 

An alternative approach that may be nearly as instructive is to use 
the currently available result regarding PSP as a baseline in a continuing 
comparative analysis — to find restricted forms of PSP that are NP-hard 
but as specialized as possible, and to find others that are tractable but as 
general as possible. The motivations for pursuing this methodology are 
both practical and theoretical: 

o Every NP-hardness result permits us to know in advance that a certain 
group of algorithms is likely to fail, and is therefore not worth pursuing 
(Section 4). 

o Conversely, every NP-hardness result helps identify a source of com- 
plexity in protein-structure prediction, and therefore what must be 
stripped away from the problem before it is reasonable to attempt 
efficient solution. 

The work of Finkelstein and Reva (1992) is a good example; an 
approach to structure prediction with a guaranteed polynomial time 
bound was developed. The critical assumption behind the algorithm is 
that only nonbonded interactions between nearest neighbors along the 
chain are significant. Because of this assumption, the algorithm can- 
not solve all instances of PSP, but instead is restricted to instances in 
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which only nonbonded interactions between nearest neighbors along 
the chain are nonzero, 40 This violates the requirements of the reduc- 
tion from Partition to PSP, in which nonbonded interactions between 
sites distant from each other along the chain are essential Thus, the 
problem is similar in character to that examined by Zwanzig et al. 
(Section 5.3). While the Finkelstein-Reva algorithm was not inspired 
by an NP-hardness result, the underlying strategy is similar to how 
NP-hardness results might be used; they removed from the problem 
what they observed to be a source of complexity. However, in this 
case, removing the source of complexity led to a problem different 
. from that posed by protein folding, in which long-range interactions 
play an essential role. 

• Hie NP-hardness of PSP serves as the premise for a reformulation of 
the Levinthal paradox (Section 5), whose conventional form is based 
on a model of folding that is in conflict with known experimental 
results. A motivation for pursuing an analysis of the computational 
complexity of protein-structure prediction is to assist in the construc- 
tive role of the Levinthal paradox — to help focus attention on the key 
questions in protein folding. 

A small number of reasonably well-defined potential resolutions to 
the computational-complexity form of the Levinthal paradox were 
listed in Section 5. One of the possible resolutions is that protein- 
structure prediction is tractable. NP-hardness results with restricted 
forms of PSP would make that possible resolution less likely, thus 
lending credence to the alternatives. 

Attempts to resolve the Levinthal paradox, which play a valid and 
useful role in helping to understand how proteins fold, can lead to 
confusion because the premises of the original form of the paradox 
are not well formulated. In particular, one such proposed resolution 
(Zwanzig et al., 1992) can be shown unequivocally not to resolve 
the computational complexity fprm of the paradox, and in related 
arguments (Karplus and Shakhnovich, 1992) has been shown to lead 
to physically incorrect consequences (Section 5,3). For the paradox to 
be meaningful, it must be "felsifiable"— it must be possible to know 
when the paradox has been resolved. 

In addition to restricted forms of PSP, it would be useful to know the 
computational complexity of other tasks in structure prediction that appear 
easier than the general problem, but whose complexities are none the less 
uncertain. 
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The task of computing side-chain conformations given full knowledge 
of a protein's backbone conformation is one such problem. Case studies us- 
ing simulated annealing (Lee and Subbiah, 1991) have suggested that pack- 
ing effects may suffice to determine, in part, the side-chain conformations 
in a protein's core. The computational complexity of this packing prob- 
lem is unknown. Because only short-range effects are present, the graph 
of possible side-chain-side-chain interactions can be known in advance, is 
sparse, and consists of vertices of low degree. Previous experience— for 
instance, with Ising spin-glass models (Barahona, 1982), graph colorabil- 
ity (Garey and Johnson, 1979, p. 191) and cartographic labeling (Formann 
and Wagner, 1991; Marks and Shieber, 1991)— illustrates that such neigh- 
borhood interactions can, on their own, give rise to NP-hardness. On the 
other hand, many problems that contain such neighborhood interactions 
are tractable if restrictions can be placed on the nature of the graph (Garey 
and Johnson, 1979), suggesting that the problem of finding a mutually ac- 
ceptable set of side-chain conformations for a protein could be tractable. 
(One currently known algorithm for predicting side-chain conformations 
based on backbone positions achieves 70% to 80% accuracy for xi and xi 
angles [Dunbrack and Karplus, 1993].) Not knowing the computational 
complexity of side-chain structure prediction leaves the algorithm devel- 
oper in the quandary of not knowing whether inexact methods are truly 
necessary, given the possible existence of a superior exact algorithm. 
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Foundation. This research was supported in part by grants from the National 
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Notes 

1 The Thermodynamic Hypothesis states that a protein's native fold is the con- 
figuration of globally minimal free energy. However, it is generally assumed that 
a protein's states of lowest free energy are similar enough in entropy to justify the 
use of potential energies instead of free energies as a computational convenience; 
potential energies are much faster and mote straightforward to compute. 

2 For example, if only nonbonded interactions between nearest neighbors along 
the chain are significant, the global minimum structure can be predicted efficiently 
(Finkelstein and Reva, 1992). 

3 The texmcombinatorial optimization is normally reserved for problems in which 
the solution space is discrete. Throughout this chapter we use the term to refer 
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Introduction 



Other areas of major interest include the study of cell 
Actions and intracellular control mechanisms in 
Satiation and development [Auerbach ujd 
Grobstein, 1958; Cox, 1974; Fmbow and Pitts, 1981] 
Sid attempts to analyze nervous function [Bornstein 
^d Murray, 1958; Minna et al., 19721 Progress in 
neurological research has, however, not had the bene- 
fit of working with propagated cell lines as propaga- 
tion of neurons has not so-far been possible m vara 
without resorting to the use of transformed cells (see 

^TUs^cuhure technology has also been adopted into 
many routine applications in medicine and industry^ 
Snosomal aSysis of cells derived from die womb 
by amniocentesis can reveal genetic disorders in the 
unborn child, viral infections may be assayed qualita- 
tively and quantitatively on monolayers of appropriate 
host cells, and the toxic effects of pharmaceutical com- 
nCrand potential environmental pollutants can be 
measured in colony-forming assays. 

further developments in the application of tissue 
culture to medical problems may follow from the dem- 
onstration that cultures of epidermal ceUs form func- 
tionally differentiated sheets in culture [Green et al 
19791 and endothelial cells may form capillaries 
rpolkman and Haudenschild, 1980], suggesting possi- 
bilities in homografting and reconstructive surgery us- 
ing an individual's own cells. The introduction of 
heterologous genetic material into mammalian cells 
et 1 1979; Wigler et al., 1979], although 
somewhat overshadowed by current propagation in 
bacteria, may yet prove a desirable means for produc- 
ing biologically significant compounds such as growth 
hlmone 8 and insulin. Similarly, the P^cUon °f 
monoclonal antibodies [Kohler and Mdstem 1975] in 
hybrids between human plasma cells and human mye 
loma cells may prove a valuable technique for the 
production of specific antibodies. 

It is clear that the study of cellular activity in tissue 
culture may have many advantages; but in summarrz, 
in* these, below, considerable emphasis must also be 
placed on its limitations, in order to maintain some 
sense of perspective. 

ADVANTAGES OF TISSUE CULTURE 

Control of the Environment 

The two major advantages, as implied above are 
the control of the physicochemical environment (pH, 
temperature, osmotic pressure, 0 2 , CO, tension), 



which may be controlled very precisely, and the phys- 
iological conditions, which may be kept : relauvely con- 
stanfbut cannot always be defined. Most media *n 
require supplementation with serum which ,s highly 
viable [Ohnsted, 1967; Honn et al., 1975], and con- 
tains undefined elements such as hormones and other 
regulatory substances. Gradually, however, the func- 
tions of serum are being understood; and as a resul U 
is being replaced by defined constituents [B.rch and 
PirT?971; Ham and McKeehan, 1978; Barnes and 
Sato, 1980]. 



Characterization and Homogeneity of Sample 

Tissue samples are invariably heterogeneous. Repli- 
cauTeven from one tissue vary in their constituen ceU 
types. After one or two passages, cultured cell hues 
aSume a homogeneous, or at leastuni orui ^ 
tion as the cells are randomly mixed at each transfer 
L the selective pressure of the culture c^ndnions 
tends to produce a homogeneous culture o the most 
vigorous cell type. Hence, at each subculture each 
plicate sample will be identical, and the characteris- 
tics of the line may be perpetuated oyer several gener- 
ations. Since experimental replicates are virtually 
identical, the need for statistical analysis of variance is 
seldom required. 
Economy 

Cultures may be exposed directly to a reagent at a 
lower and defined concentration, and with duect access 
to the cell. Consequently, less is required than for 
injection in vivo where >90% is lost by excretion and 
distribution to tissues other than those under study. 

BIS ADVANTAGES 

EX SItte techniques must be carried out under strict 
aseptic conditions, because animal cells grow much 
lesT rapidly than many of the common contaminants 
such as bacteria, molds, and yeasts. Furthermore, un- 
like microorganisms, cells from multiceUular aroma* 
do not exist in isolation, and consequently, are not able 
to sustain independent existence without the provision 
of a complex environment, simulating blood plasma or 
interstitial fluid. This implies a level of skill and under- 
standing to appreciate the requirements of the system 
and to diagnose problems as they arise. Tissue cufture 
should not be undertaken casually to run one or two 
experiments. 



Quantity 

A major limitation of cell culture is the expenditure 
of effort and materials that goes into the production of 
relatively little tissue. A realistic maximum per batch 
for most small laboratories (2 or 3 people doing tissue 
culture) might be 1-10 g of cells. With a little more 
effort and the facilities of a larger laboratory, 10-100 
g is possible; above 100 g implies industrial pilot plant 
scale, beyond the reach of most laboratories, but not 
impossible if special facilities are provided. 

The cost of producing cells in culture is about ten 
times that of using animal tissue. Consequently, if 
large amounts of tissue (>10 g) are required, the 
reasons for providing them by tissue culture must be 
very compelling. For smaller amounts of tissue (^10 
g), the costs are more readily absorbed into routine 
expenditure; but it is always worth considering whether 
assays or preparative procedures can be scaled down. 
Semimicro- or micro-scale assays can often be quicker 
due to reduced manipulation times, volumes, centri- 
fuge times, etc. and are often more readily automated 
(see under Microtitration, Chapter 19). 

Instability 

This is a major problem with many continuous cell 
lines resulting from their unstable aneuploid chromo- 
somal constitution. Even with short-term cultures, al- 
though they may be genetically stable, the heterogeneity 
of the cell population, with regard to cell growth rate, 
can produce variability from one passage to the next. 
This will be dealt with in more detail in Chapters 12 
and 18. 

MAJOR DIFFERENCES IN VITRO 

Many of the differences in cell behavior between 
cultured cells and their counterparts in vivo stem from 
the dissociation of cells from a three-dimensional ge- 
ometry and their propagation on a two-dimensional 
substrate. Specific cell interactions characteristic of 
the histology of the tissue are lost, and, as the cells 
spread out, become mobile and, in many cases, start 
to proliferate, the growth fraction of the cell popula- 
tion increases. When a cell line forms it may represent 
only one or two cell types and many heterotypic inter- 
actions are lost. 

The culture environment also lacks the several sys- 
temic components involved in homeostatic regulation 
in vivo, principally those of the nervous and endocrine 
systems. Without this control, cellular metabolism may 



be more constant in vitro than in vivo, but may not be 
truly representative of the tissue from which the cells 
were derived. Recognition of this fact has led to the 
inclusion of a number of different hormones in culture 
media (see Chapter 9) and it seems likely that this 
trend will continue. 

Energy metabolism in vitro occurs largely by glyco- 
lysis, and although the citric acid cycle is still func- 
tional it plays a lesser role. 

It is not difficult to find many more differences 
between the environmental conditions of a cell in vitro 
and in vivo and this has often led to tissue culture being 
regarded in a rather skeptical light. Although the exist- 
ence of such differences cannot be denied, it must be 
emphasized that many specialized functions are ex- 
pressed in culture and as long as the limits of the model 
are appreciated, it can become a very valuable tool. 

Origin of Cells 

If differentiated properties are lost, for whatever 
reason, it is difficult to relate the cultured cells to 
functional cells in the tissue from which they were 
derived. Stable markers are required for characteriza- 
tion (see Chapter 15); and in addition, the culture 
conditions may need to be modified so that these mark- 
ers are expressed (see next chapter). 

DEFINITIONS 

There are three main methods of initiating a culture 
[Schaeffer, 1979] (see Glossary and Fig. 1.2): (1) Or- 
gan culture implies that the architecture characteristic 
of the tissue in vivo is retained, at least in part, in the 
culture. Toward this end, the tissue is cultured at the 
liquid/gas interface (on a raft, grid, or gel) which 
favors retention of a spherical or three-dimensional 
shape. (2) In primary explant culture a fragment of 
tissue is placed at a glass (or plastic)/Iiquid interface 
where, following attachment, migration is promoted 
in the plane of the solid substrate. (3) Cell culture 
implies that the tissue or outgrowth from the primary 
explant is dispersed (mechanically or enzymatically) 
into a cell suspension which may then be cultured as 
an adherent monolayer on a solid substrate, or as a 
suspension in the culture medium. 

Organ cultures, because of the retention of cell inter- 
actions as found in the tissue from which the culture 
was derived, tend to retain the differentiated properties 
of that tissue. They do not grow rapidly (cell prolifer- 
ation is limited to the periphery of the explant and is 
restricted mainly to embryonic tissue) and hence cannot 
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